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ABSTRACT 


Sentiment analysis is used in opinion mining. It helps businesses understand 
the customers’ reviews with a particular product by analyzing their emotional 
from the product reviews they post, the online recommendations they make, 
their survey responses and other forms of social media text. Businesses can in 
get feedback on how happy or sad the customer is, and use this insight to gain 
a competitive edge. In this article, we explore how to conduct sentiment 
analysis on a piece of text using some machine learning techniques. Python and 
happens to be one of the best programming language, when it comes to 
machine learning as it is easy to learn, is open source, and is effective in 
catering to machine learning requirements like processing big datasets and 
performing mathematical computations. Natural Language ToolKit (NLTK) is 
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one of the popular packages in Python that can use for in sentiment analysis. 


KEYWORDS: Python, Sql, Haddop, Database, Machine Learning, Natural 


Language 


1. INTRODUCTION 

With the Large amount of increase in the web technologies, 
the no of people expressing their views and the opinion via 
web. This information is useful for everyone like businesses, 
governments and individuals with 500+ million reviews per 
day, twitter is becoming a major source of information. Input 
to our model is the raw data extracted from reviews. For the 
Same, we automate the process of text extraction and 
categorizing it into two categories 1.e. positive or negative. 
The content in twitter generated by the user is about 
different kinds of products, event, people and political affair. 


Performing sentiment analysis on text is considered best due 
to the following reasons: 

1. Text are abstract in nature. 

2. Analysis in real time can be done. 

3. Avast variety of text for performing the analysis. 


Our Proposal 

The main reason of model of twitter data analysis will be 
implemented using Anaconda python. Anaconda is open 
source distribution of the Python and R programming 
languages for data science and machine learning related 
applications. It can also install on Windows, Linux, and 
MacOS. Conda is an free source, cross-platform, package 
management system. 


The texts can be analysed and characterized based on the 
emotions used by the social users. We attempt to classify the 
polarity of the text where it is either positive or negative. If 
the text has both positive and negative elements, the more 
dominant sentiment should be take as the final label. We use 
the dataset from Kaggle which was crawled and labelled 
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positive /negative. The data provided comes with emotions, 
usernames and hash tags which are required to be processed 
and converted into a standard form. It also needs to extract 
useful features from the text such as unigrams and bigrams 
which is a form of representation of the “text”. 


2. SOCIAL NETWORK ANALYSIS 

Social network analysis is the study of people's Interactions 
and communications on different topics and nowadays it has 
received more attraction. Millions of people give their 
opinion of different topics on a daily basis on social medias 
like Facebook and Twitter. It has many applications in 
different areas of research from social science to business. 
Twitter nowadays is one of the popular social media which 
according to the statistic currently has over 300 million 
accounts. Twitter is the rich source to learn about people’s 
Opinion and sentimental analysis. For each text it is 
important to determine the sentiment of the text whether is 
it positive, negative, or neutral. Another challenge with 
twitter is only 140 characters is the limitation of each tweet 
which cause people to use phrases and works which are not 
in language processing. Recently twitter has extended. 


3. HYPHOTHETICAL SUGGESTED APPROCH 

An assumed number of reviews depend on whether or not it 
is ironic. Using a learning algorithm to classify after a tweet 
abstract, a set of features is assigned to fitting. The extraction 
of the feature is done to detect the sarcasm in reviews. 


Hypothetical Data: 
An assumed number of reviews depend on whether or not it 
is ironic. Using a learning algorithm to classify after a tweet 
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abstract, a set of features is assigned to fitting. The extraction 
of the feature is done to detect the sarcasm in reviews. 


Mortgage data: 

The Twitter's allow us to collect the reviews. To collect 

satirical tweet with the #sarcasm hash tag. Although the 

writer says # tag. However, works in the additional stressed 

that this # tag can be used for only a few purposes. However, 

the hash tag is not robust but can mainly be used for this 

purpose: 

> Useasan anchor for research 

> To find an irony marker in a truly sensitive sarcasm 
wherever it is extremely difficult to achieve 


Induce a clear marker to ylack them, as "It was fun today. For 
the first time in weeks! #sarcasm 


RESULT AND DISCUSSION 
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Fig User Registration 
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Fig Upload Dataset 


sHie 


Mas -Becice 





(Miya aaa Ds 








je} 





(WE KeEhpaaseawae 


Fig Search Query 


CONCLUSION 

The analysis of Text format is being done in different points 
of view to mine the opinion or sentiment. Our proposed 
approach classify the texts as Positive and Negative texts 
which further helps in sentiment analysis and uses that 
sentiment analysis for further decision making. For our 
prototype, Twitter API is used to gather data in real-time. 
The prototype back-end tests on retrieving and processing 
the API data indicate that it is successful in gathering huge 
amounts of data from popular search terms in real-time. We 
will use various machine learning algorithms to conduct 
sentiment analysis using the extracted features. However, 
just relying on individual models did not give a high accuracy 
so we pick the top few models to generate a model. 
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